Home

Computer science

'

Deep Reinforcement Learning --- \ufeffPlease answer all question

Assignment 01 – \ufeffProblem Statement

13 \ufeffMarks

Title: Propose a suitable title

Problem Statement: Define a problem statement of your own with a well-defined objective, gaming environment, and game controls. [1 \ufeffMark]

Concept Sketch: A pen and paper-based game concept sketching to illustrate the proposed gaming problem statement. [1 \ufeffMark]

Additional Information: Provide any necessary information assumed/considered for the game implementation.

Requirements and Deliverables:

Elaborate on how the described problem could be solved using deep neural network and explain the action plan to create a gaming environment. [1 \ufeffMark]

Prepare a Colab sheet with outputs saved satisfying the following requirements. Implementation should be in OpenAI gym with python. Develop a deep neural network architecture and training procedure that effectively learns the optimal policy for the spaceship to avoid collisions with asteroids and maximize its survival time in the game environment.

i.

Environment Setup: Define the game environment, including the state space, action space, rewards, and terminal conditions. [1.5 \ufeffMark]

ii.

Replay Buffer: Implement a replay buffer to store experiences (state, \ufeffaction, reward, next state, terminal flag). [1.5 \ufeffMark]

iii.

Deep Q-Network Architecture: Design the neural network architecture for the DQN using Convolutional Neural Networks. The input to the network is the game state, and the output is the Q-values for each possible action. [2 \ufeffMarks]

iv.

Epsilon-Greedy Exploration: Implement an exploration strategy such as epsilon-greedy to balance exploration (trying new actions) \ufeffand exploitation (using learned knowledge). [1 \ufeffMark]

v.

Training Loop: Initialize the DQN and the target network (a separate network used to stabilize training). \ufeffIn each episode, reset the environment and observe the initial state. [2 \ufeffMarks]

vi.

Testing and Evaluation: After training, evaluate the DQN by running it in the environment without exploration (set epsilon to 0). \ufeffMonitor metrics such as average reward per episode, survival time, etc., to assess the performance. [2 \ufeffMark]

Please provide the complete code based solution.

'

Answer